Computational Methods for Coptic

نویسندگان

  • Amir Zeldes
  • Caroline T. Schroeder
چکیده

This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evaluate our tag set in an inter-annotator agreement experiment and examine some of the difficulties in tagging Coptic data. Using an existing digital lexicon and a small training corpus taken from several genres of literary Sahidic Coptic in the first half of the first millennium, we evaluate the performance of a stochastic tagger applying a fine grained and coarse grained set of tags within and outside the domain of literary texts. Our results show that a relatively high accuracy of 94-95% correct automatic tag assignment can be reached for literary texts, with substantially worse performance on documentary papyrus data. We also present some preliminary applications of natural language processing to the study of genre, style and authorship attribution in Coptic and discuss future directions in applying computational linguistics methods to the analysis of Coptic texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities

This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendent of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...

متن کامل

Typesetting Coptic Liturgy in Bohairic

This paper describes what the authors have done in order to typeset some Coptic texts with LATEX mainly in the Bohairic variant used in liturgy. This implied the creation of suitable fonts, the macros for typesetting special liturgical symbols, the hyphenation patterns necessary to typeset with the Coptic alphabet and the rules used by the Bohairic variant.

متن کامل

Root and Pattern Morphology in Coptic

The primary goal of this paper is to develop an Optimality Theory (Prince and Smolensky 1993/2004) analysis of the root and pattern morphology of Coptic. Coptic (spoken ca. 300-1300 C.E.) was the last stage of the Ancient Egyptian language, and its root and pattern morphology has not previously been analyzed from a synchronic perspective. Specifically, I aim to determine whether the consonantal...

متن کامل

Coptic Conversion and the Islamization of Egypt (MSR X.2, 2006)

Articles by Gaston Wiet in the 1920s, M. Perlmann in 1942, and Donald Little in 1976 have encouraged the perception that the first century of the Mamluk period marked a turning-point in the history of Coptic conversion to Islam. According to Wiet in his article on the Copts in the Encyclopaedia of Islam: "The government of the Mamluks gave the coup de grâce to Christianity in Egypt," and he goe...

متن کامل

An NLP Pipeline for Coptic

The Coptic language of Hellenistic era Egypt in the first millennium C.E. is a treasure trove of information for History, Religious Studies, Classics, Linguistics and many other Humanities disciplines. Despite the existence of large amounts of text in the language, comparatively few digital resources have been available, and almost no tools for Natural Language Processing. This paper presents a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015